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DNase-seq is primarily used to identify nucleosome-depleted DNase I hypersensitive (DHS) sites genome-wide that cor- 
respond to active regulatory elements. However, -40 yr ago it was demonstrated that DNase 1 also digests with a ~10-bp 
periodicity around nucieosomes matching the exposure of the DNA minor groove as it wraps around histones. Here, we 
use DNase-seq data from 49 samples representing diverse cell types to reveal this digestion pattern at individual loci and 
predict genomic locations where nucleosome rotational positioning, the orientation of DNA with respect to the histone 
surface, is stably maintained. We call these regions DNase 1 annotated regions of nucleosome stability (DARNS). Com- 
pared to MNase-seq experiments, we show DARNS correspond well to annotated nucieosomes. interestingly, many 
DARNS are positioned over only one side of annotated nucieosomes, suggesting that the periodic digestion pattern 
attenuates over the nucleosome dyad. DARNS reproduce the arrangement of nucieosomes around transcription start sites 
and are depleted at ubiquitous DHS sites. We also generated DARNS from multiple lymphoblast cell line (LCL) samples. 
We found that LCL DARNS were enriched at DHS sites present in most of the original 49 samples but absent in LCLs, while 
multi-cell-type DARNS were enriched at LCL-specific DHS sites. This indicates that variably open DHS sites are often 
occupied by rotationally stable nucieosomes in cell types where the DHS site is closed. DARNS provide additional in- 
formation about precise DNA orientation within individual nucieosomes not available from other nucleosome posi- 
tioning assays and contribute to understanding the role of chromatin in gene regulation. 

[Supplemental material is available for this article.] 



Most of the human genome, like other eukaryotes, is concentrated 
in nucieosomes, which consist of ~ 147 bases of DNA wrapped — 1.7 
times around a histone octamer and connected by DNA linkers of 
variable length (Richmond and Davey 2003; Wang et al. 2008). As 
the basic unit of packaging in chromatin, nucieosomes incorporate 
the majority of bases in the genome (Felsenfeld and Groudine 2003; 
Valouev et al. 2011). Nucieosomes plays a role in regulating gene 
transcription by controlling the accessibility of transcription factor 
(TF) binding sites at key locations (Albert et al. 2007). 

The most common experimental method for genome-wide in 
vivo mapping of nucieosomes within the cell is to treat nuclei with 
micrococcal nuclease (MNase-seq) followed by high-throughput 
sequencing (Schones et al. 2008; Chodavarapu et al. 2010; Valouev 
et al. 2011). MNase preferentially digests within linkers, resulting 
in mononucleosome fragments that can be used to infer the dyad 
or center of well-positioned nucieosomes that maintain their exact 
coordinates across a population of cells (Noll 1974; Pugh 2010). 
Current studies have used MNase or chemical modifications to 
improve the resolution of "translational" positioning, which is de- 
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fined as the location of the histone core along the DNA (Valouev 
et al. 201 1; Brogaard et al. 2012). However, it is difficult to determine 
the "rotational" positioning — the orientation of the DNA major and 
minor groove relative to the histone surface — from these data. 
Moreover, analysis of MNase-seq results may overlook "fuzzy" nu- 
cieosomes whose positions fluctuate across a population of cells, 
making it difficult to determine the alignment of the dyad. But even 
fuzzy nucieosomes may maintain rotational setting preferences as 
they undergo translational shifts of —10 bp (Albert et al. 2007; 
Gaffney et al. 2012) that preserve the relationship with underlying 
dinucleotide periodicities (Trifonov and Sussman 1980; Satchwell 
et al. 1986; Segal et al. 2006). Nucieosomes that are crucial in reg- 
ulating gene expression may shift at this period to maintain orien- 
tation as they allow TF access in response to cell conditions (Hu et al. 
201 1). In addition, the conservation of rotational positioning strikes 
a balance between the limited contribution of periodic sequence 
preferences (Satchwell et al. 1986; Segal et al. 2006; Kaplan et al. 
2009) and the statistical positioning theory, which posits that most 
nucieosomes are uniformly packed between nucleosome bound- 
aries formed by promoters, TF binding sites, and nucleosome-oc- 
cluding sequences (Mavrich et al. 2008; Zhang et al. 2009). In the 
human genome, a previous study estimated that only 20% of nu- 
cieosomes are well-positioned translationally (Valouev et al. 2011), 
although the remaining nucieosomes may conserve their rotational 
setting and reflect additional regulatory potential. 
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DNase-seq predicts nucleosome positioning 



Here, we present a new, complementary method that uses 
DNase I digestion data to predict regions of nucleosome rotational 
stability. DNase I preferentially digests DNA in nucleosome-de- 
pleted open chromatin regions and is generally used to locate all 
types of active regulatory elements (Wu et al. 1979; Gross and 
Garrard 1988). When coupled with high-throughput sequencing 
(DNase-seq), DNase I hypersensitive (DHS) sites are identified by 
the enrichment in DNase-seq reads. Each DNase-seq read indicates 
a digestion site at a specific locus with single-base resolution (Gross 
and Garrard 1988; Boyle et al. 2008). Early studies documented 
that DNase I-digested mononucleosomes resulted in DNA frag- 
ments that spaced —10 bp apart using gel electrophoresis (Noll 
1974). By plotting the distribution of distances between pairs of 
DNase-seq reads mapping outside of DHS sites, we previously 
observed an oscillation at —10.4 bp (Boyle et al. 2008). Since the 
DNA helix completes one full turn in the same period (Wang 
1979), this is consistent with the preference of DNase I to digest in 
the minor groove and its periodic exposure as DNA wraps around 
the nucleosome (Noll 1974; Cousins et al. 2004). Thus, the presence 
of this DNase I digestion pattern appears to be correlated with the 
position of nucleosomes. 

In this study, we created a model that predicts individual 
DNase-annotated regions of nucleosome stability (DARNS) of vary- 
ing length that are consistently occupied by nucleosomes with 
preserved rotational settings. Whereas previous studies demon- 
strated the 10.4-bp DNase I nucleosome pattern as a cumulative 
signal (Noll 1974; Boyle et al. 2008; Gaffney et al. 2012), we further 
this work by revealing that this period can be detected at individual 
loci and used to reverse engineer the location of DARNS. We de- 
termined that the rotational stability was highly conserved across 49 
distinct DNase-seq data sets from diverse cell types. Unexpectedly, 
when we evaluated DARNS against other nucleosome annotations, 
we found that many were positioned on only one side of the nu- 
cleosome without crossing the dyad, where the periodic digestion 
pattern appears to be attenuated. By using recently available DNase- 
seq data for lymphoblastoid cell lines from multiple individuals 
(Degner et al. 2012), we also annotated DARNS in a single cell type. 
We found evidence of dynamic nucleosome positioning across cell- 
type-specific DHS sites compared with the multi-cell-type DARNS. 
Our results provide the first genome-wide annotation of the orien- 
tation of nucleosomes and reveal high-resolution features of nu- 
cleosomes that have precise rotational settings. 

Results 

Periodic DNase I digestion pattern is conserved across cell types 

The 10.4-bp spacing of aligned DNase-seq reads was previously 
shown in genome-wide aggregate plots within a single-cell-type 
DNase-seq experiment (Boyle et al. 2008) but has been difficult to 
identify at individual loci due to the lack of sufficient read cover- 
age. To increase overall read coverage, we integrated DNase-seq 
results from multiple cell types, isolated from various tissues in 
different individuals grown in nonidentical conditions (for list 
of cell types, see Methods) that individually demonstrated the 
10.4-bp periodicity between DNase I digestion sites (data not 
shown). To verify that this periodicity was conserved between 
experiments, we plotted the pairwise spacing between reads from 
different cell types. The ~10-bp oscillation in the plots was main- 
tained for reads originating in human umbilical vein endothelial 
cell line (HUVEC) compared with reads on the same strand from 
the GM12878 lymphoblast cell line (LCL) (Fig. lA); similar results 
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Figure 1. Characteristics of DNase-seq 10-bp periodicity pattern. {A) 
Distances between reads from two different cell types (human umbilical 
vein endothelial cell line [HUVEC] and GM12878 lymphoblastoid cell line 
[LCL]) are plotted and show the 1 0-bp periodic spacing preference. We 
show lines both displaying distances from HUVEC reads to LCL reads and 
vice versa. (S) Similar read spacing plots showing 10-bp period is present 
in combined data from multiple cell types in DNase-seq data but absent in 
FAIRE-seq (Formaldehyde-Assisted Identification of Regulatory Elements) 
or permuted DNase data. (C) Read spacing plot showing 1 0-bp period in 
LCL DNase-seq reads overlapping non-LCL DNase I hypersensitive (DHS) 
sites but not in LCL-specific DHS sites. 



were obtained from all pairs of cell types tested (data not shown). 
Thus, the 10.4-bp spacing in digestion sites is generally in phase 
across cell types, suggesting that rotational positioning is widely 
conserved. 

W^e proceeded to combine DNase-seq data from 49 samples 
representing 43 distinct cell types generated by our group at Duke 
as part of the ENCODE project (for combination algorithm, see 
Methods) (McDaniell et al. 2010; Song et al. 2011; The ENCODE 
Project Consortium 2012; Thurman et al. 2012). compiled only 
the uniquely aligned reads from each of the DNase-seq experi- 
ments totaling —1.5 billion data points on either strand. The dis- 
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tribution of distances between pairs of reads in the combined data 
set exhibited the expected spacing on both the negative (Fig. IB) 
and positive (data not shown) strands with an estimated period of 
10.45 ± 0.21. When we sequenced a library of naked genomic DNA 
that was digested with DNase I, we found that the read spacing 
from these data does not exhibit periodicity (Supplemental Fig. 
SI A). In addition, we found that positive-strand digestion tended 
to be ~3 bp offset from the negative-strand digestion (Supple- 
mental Fig. SIB). This is consistent with our previous results re- 
ported in one cell type (Boyle et al. 2008) and is likely associated 
with the 3 ' overhang previously documented at DNase I cut sites 
(Sollner-Webb et al. 1978, Cousins et al. 2004). We also noted that, 
unlike similar periodic profiles seen in MNase-seq studies that 
demonstrated a substantial increase in signal at —150 bp (Valouev 
et al. 201 1), the periodic spacing of DNase I reads attenuated as the 
distance between reads increased with no significant rise at the 
length of a nucleosome. These results support periodic DNase I 
digestion along nucleosome-bound DNA, as opposed to MNase 
digestion primarily within the linker. 

To ensure these results were not due to chance, we generated 
a pair of negative control data sets. First, we permuted the base 
positions of the combined DNase-seq data within 1000-bp windows. 
This effectively disrupted local spacing patterns while maintaining 
local distributions of read counts. For this permuted data set, no 
obvious periodicity was observed, but rather a more uniform distri- 
bution of read pair distances was found (Fig. IB). Second, to confirm 
that the 10-bp period of digestion sites was unique to DNase-seq 
digestion and not an inherent property of open chromatin assays, we 
performed a similar analysis on FAIRE-seq (Formaldehyde-Assisted 
Identification of Regulatory Elements) data, which similarly em- 
ploys high-throughput sequencing to identify open chromatin 
(Giresi et al. 2007). Since FAIRE-seq uses random sonication to 
fractionate the genome, we expected that the positions of aligned 
reads from the ends of these fragments would be unrelated to fine 
nucleosome positioning. We combined FAIRE-seq data generated 
from 19 samples shared with the DNase-seq data as part of the 
ENCODE project, totaling —830 million data points on each strand 
(Song et al. 2011; M. Schaner, J.M. Simon, K.A. Showers, Z. Zhang, 
RG. Giresi, L. Song, D. London, T.S. Furey, G.E. Crawford, J.D. Lieb, 
unpubl.). When the distances between pairs of FAIRE-seq reads were 
plotted as above, no oscillation signal was evident (Fig. IB). Hence, 
the 10-bp spacing of reads appears unique to the specificity of di- 
gestion sites of the DNase I enzyme. 

Periodic DNase I digestion patterns are absent 
in nucleosome-depleted DHS regions 

In our previous study, we showed that the 10.4-bp period in DNase 
I digestion disappeared when only DNase-seq reads originating 
from DHS sites in that cell type were included (Boyle et al. 2008). 
Similarly we separately plotted the spacing between pairs of reads 
that mapped within DHS sites identified in the cell type from 
which the data were generated, and reads that mapped outside of 
these DHS sites; we observed the oscillation pattern only for the 
latter set of non-DHS site reads (Supplemental Fig. SIC). 

We performed a similar analysis but focused on reads from 
only the seven LCLs that aligned to DHS sites identified in any cell 
type. We compared spacing between reads that mapped within 
DHS sites found in all LCLs (ubiquitous LCL DHS) and those that 
mapped within DHS sites found in other cell types but not LCLs 
(non-LCL DHS). We found that the characteristic oscillation was 
evident in the spacing of reads from non-LCL DHS sites, where we 



would expect nucleosome in LCLs, but was absent for reads in 
ubiquitous LCL DHS sites that would be nucleosome-depleted 
(Fig. IC). This suggests that in cell types where a DHS site is closed, 
the occupying nucleosome appears to establish a position that 
contains a precise rotational setting, and further supports the 
connection of periodic digestion with the dynamic positioning of 
nucleosomes. 

Local DNase I digestion site spacing is predominantly 10.4 bp 
in length 

By examining the combined DNase-seq data, we find that some 
genomic locations with sufficiently high read density clearly dis- 
play the 10.4-bp digestion pattern on both strands, with the pos- 
itive-strand digestion peaks 3 bp downstream from the negative- 
strand digestion peaks (Fig. 2A). This was not observed in either 
FAIRE-seq or the permuted data sets. Although nucleosomes likely 
incorporate the vast majority of the genome, the 10.4-bp period in 
preferred digestion sites is not readily detectable over much of the 
genome. This inconsistency is possibly due to noise in the data, the 
imbalanced coverage of reads in the DNase-seq data, or periodic 
patterns that are not maintained across cell types. 

To better detect recurring periods of any length in the DNase I 
digestion patterns, we used Fourier transforms to decompose the 
DNase-seq data in small segments of the genome into component 
frequencies. For each overlapping 1000-bp window sliding along 
chromosome 1, we calculated the dominant period, indicating the 
most prevalent pattern in the window. Based on the Fourier anal- 
ysis results, we estimate the average period of digestion around the 
nucleosome is between 10.3 and 10.4 bp (Fig. 2B; Supplemental 
Fig. S2A). As a negative control, we similarly analyzed the permuted 
(Fig. 2C) and FAIRE-seq data (Supplemental Fig. S2B) and observed 
that the most prevalent periods were very small (<3 bp). This is 
consistent with the lack of oscillation described previously (Fig. IB). 
Hence, the 10.4-bp period is recoverable at single loci in DNase-seq 
data but not in other data sets. 

Identifying DNase 1 digestion patterns at individual loci 

To systematically detect the 10.4-bp digestion pattern genome- 
wide given variation in signal and read depth, we first created 
a 91 -bp pattern from the observed genome-wide DNase-seq read 
spacing that represented the expected digestion around the nu- 
cleosome (see Methods) (Fig. IB, blue DNase line; for pictorial 
representation, see Supplemental Fig. 3 A). Then, we calculated the 
Pearson correlation coefficient between this pattern and the dis- 
tribution of the combined DNase-seq reads in 91 -bp windows 
centered on each base of the genome, independently testing both 
strands (Fig. 3A). High positive correlations, anticipated once per 
complete DNA turn, indicated a good match to the expected pat- 
tern in nucleosomes, while high negative correlations occurred 
whenever the pattern was out of phase. Compared with FAIRE-seq 
and permuted data sets, the distribution of DNase-seq correlation 
scores was enriched at the extreme values (P < 1~^°) (Fig. 3B), in- 
dicating that these correlations may be used to predict stable ro- 
tational nucleosome positioning at base-pair resolution. 

To identify bases with high correlation, we first defined local 
maxima or peaks in the correlation scores and plotted the distance 
between pairs of peaks from opposite strands (for definition of 
peaks, see Methods) (Fig. 3C). From this plot, we noted that cor- 
relation peaks (1) occurred with ~ 10.4-bp spacing; (2) were 2-3 
bp offset between positive and negative strands, consistent with 
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Figure 2. The 10-bp periodic digestion can be detected at individual genomic loci. {A) The combined read counts at individual bases shows the 
preference for a 1 0-bp spacing between reads in DNase-seq data, but not in FAIRE-seq data or in permuted DNase-seq data. The 91 -bp nucleosome 
pattern used to identify DARNS (see Supplemental Fig. S3A) is superimposed on DNase-seq data. (S) Fourier analysis on 1000-bp windows from 
chromosome 1 reveals a prominent 1 0-bp period within DNase-seq data. (C) The same Fourier analysis on permuted data does not reveal a meaningful 
period. 



reported 3' overhang of DNase I digestion; and (3) tended to be 
closely spaced, indicating multiple matches of the 91 -bp pattern 
across whole nucleosomes. These features were not present in the 
permuted data set (Fig. 3C, purple line). As described below, we 
exploited these characteristics in creating a model to predict re- 
gions of nucleosome rotational stability. 

Predicting regions of nucleosome rotational stability 

To predict individual regions that display rotational nucleosome 
stability we devised a hidden Markov model (HMM) to identify 
sequences of DNA with average spacing of 10.4 bp between cor- 
relation peaks on the same strand and where the positive-strand 
peaks trailed the negative-strand peaks by ~3 bp. The HMM 
transitioned between a background digestion state and a nucleo- 
some meta-state composed of a cycle of states that generated the 
observed spacing between positive and negative correlation peaks 
associated with DNase I digestion of the nucleosome (for a more 
complete description, see Methods; for HMM diagram, see Supple- 



mental Fig. S3B). Figure 4A illustrates a representative region 
depicting the combined DNase-seq data, correlations scores, cor- 
relation peaks, and regions assigned by the HMM to the nucleo- 
some meta-state. 

We labeled the "nucleosome" regions called by the HMM as 
DARNS because they represent predictions of DNA sequences 
covered by consistently positioned, rotationally stable nucleo- 
somes without necessarily defining the exact boundaries of each 
nucleosome. We annotated —14 million DARNS covering 890 mil- 
lion bases (30.77%) of the genome. Across all DARNS, the mean 
distance between adjacent correlation peaks on the same strand is 
10.364 with a standard deviation of —0.8. While some DARNS cover 
only a portion of the nucleosome, others are much larger than the 
average size of a nucleosome and may reflect nucleotides over which 
one or more nucleosomes maintain their DNA orientation as their 
translational position fluctuates. The DARNS ranged in length from 
15-1282 bp with a median of 50 bp. Of the three DARNS that were 
>1 kb, two mapped near centromeres; this is consistent with pre- 
vious studies recording high, stable nucleosome occupancy in peri- 
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Figure 3. Correlating DNase-seq data to expected nucleosome pattern. (A) The expected 91 -bp 
pattern of DNase I digestion around the nucleosome (Supplemental Fig. S3A) was correlated with the 
DNase-seq data at each base across the genome. (B) The distribution of correlation scores shows that 
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permuted DNase-seq are plotted. Note that the 3-bp offset of the ~1 0 bp between correlation peaks on 
opposite strands is only detected in the DNase data. 



centromeric regions (Chodavarapu et al. 2010, Gaffney et al. 2012). 
The distance between pairs of DARNS tended to be multiples of 
10.4 bp (Fig. 4B). This may indicate that either smaller DARNS are 
from the same nucleosome but with spurious correlation peaks 
between them that deviate from the HMM model or that phasing 
occurs between individual nucleosomes (Valouev et al. 2011; 
Gaffney et al. 2012). The latter may reflect the positions of nu- 
cleosomes in higher-order structures. 

As a control, we also used our HMM to annotate regions using 
the FAIRE-seq and permuted DNase-seq data sets. These control 
sets each predicted —12 million regions covering —540 million 
(18.6%) and —520 million (17.9%) of bases in the genome, respec- 



tively. We found these control regions 
did not significantly overlap DARNS de- 
fined using DNase I data (Table 1). We 
compared these annotations to the ex- 
tended nucleosome boundaries of well- 
positioned dyads inferred from in vitro 
MNase-seq experiments (Valouev et al. 
2011). We found that DNase-defined 
DARNS showed a significant overlap 
(—39 million, or 39% of 99 million bases 
covered by in vitro nucleosomes; P = 
0.025), but a similar overlap was not 
present in either FAIRE-seq or permuted 
data sets (Table 1). We further found that 
annotated control regions were signifi- 
cantly different from DARNS when com- 
paring inherent features of these annota- 
tions, including length, read coverage, 
correlation scores, spacing between an- 
notations, and HMM posterior proba- 
bility (P < 10"^°; for complete list of 16 
features, see Supplemental Material). We 
used these differences to estimate the 
false-discovery rate (FDR) by developing 
a multivariate linear regression classifier 
based on these features that distinguished 
DNase-defined DARNS from control data 
annotations. This provided a confidence 
score for each DARNS and estimated an 
FDR for DARNS of 8.14% (for details, see 
Methods). 

To further investigate the relation- 
ship between DARNS and annotated 
nucleosome positions, we plotted the in 
vitro dyad positions, as well as in vivo 
nucleosome occupancy (Stanford Nucleo- 
some track, UCSC Genome Browser), rel- 
ative to DARNS. We first noted that the 
—600,000 in vitro dyads showed strong 
correspondence with nucleosome occu- 
pancy signal from in vivo MNase-seq 
supporting their accuracy (see Methods) 
(Supplemental Fig. S4A). We found that 
in vitro dyads were prevalent around the 
midpoints of DARNS but were not around 
regions called from the permuted data 
(Fig. 4C). Interestingly, the dyad signal was 
not centered at DARNS midpoints, but 
rather shows two distinct regions of sig- 
nificant enrichment —60 bp upstream and 
downstream (P < 0.05). These dual crests, along with the 50-bp 
median DARNS length, suggest that DARNS were preferentially 
positioned on either side of the dyad within the in vitro predicted 
nucleosomes but did not generally overlap the dyad. A similar 
profile of dual enrichment peaks was detected when comparing 
in vivo MNase-seq data from the GM12878 LCL (P < 0.05) (Fig. 
4D) and the K562 erythroleukemia cell line (data not shown). The 
lack of DARNS overlapping the exact center of nucleosomes may 
signify a disruption or attenuation of the digestion periodicity 
around or across the dyad, analogous to the previously reported 
reduction in 10-bp periodic dinucleotide signal at the dyad (Albert 
et al. 2007; loshikhes et al. 2011). We propose that digestion of the 
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DNA at tlie dyad is not as constrained as 
flanldng DNA tliat is more directly asso- 
ciated witii tiie iiistone core. Tiiis is sup- 
ported by tiie iiigli levels of DNase-seq 
reads around the in vitro dyads (Supple- 
mental Fig. S5A) combined with their 
low average correlation scores (Fig. 4E). 
The coordinates of DARNS and correla- 
tion peaks within DARNS are available 
at http://fureylab.web.unc.edu/datasets/ 
darns/ and in Supplemental Material. 

Properties of DARNS covering 
nucleosome halves 

To further investigate the bimodal signal 
seen in Figure 4, C and D, we compiled 
subsets of DARNS covering the upstream 
or 5' half and the downstream 3' half of 
the nucleosome based on their relative 
position to the nearest in vitro dyad and 
labeled according to the orientation of the 
positive strand in the genome assembly 
sequence. We created these separate sets to 
ensure that this analysis was not due to 
strand alignment biases or artefacts. This 
resulted in -475,000 5' end and 498,000 
3' end DARNS, with -269,000 corre- 
sponding to opposite sides of the same 
nucleosome. With a total of -600,000 
annotated dyads, the number of dyads 
with a pair of DARNS covering both halves 
is significantly less than that expected by 
random chance (P < 10~^^); this suggests 
that our model was less likely to annotate 
both sides of the same nucleosome and 
may indicate varying exposure of nucleo- 
some halves or a difference in stable his- 
tone contacts across the nucleosome. 
When we independently compared these 
5 ' and 3 ' end DARNS to in vivo MNase-seq 
data, we detected enrichment in MNase- 
seq signal on only one side of the DARNS, 
further supporting that we correctly as- 
signed the DARNS locations relative to the 
dyad (Supplemental Fig. S4B). 

For each of the 5 ' and 3 ' end DARNS, 
we plotted the distribution of aligned 
positive- and negative-strand DNase-seq 
read counts, as well as MNase-seq data to 
show relative locations of the nucleo- 
some and linker (for 5', see Fig. 5A; for 3', 
see Fig. 5B). In both plots, the region of 
the nucleosome covered by the DARNS 
demonstrated the strongest periodicity 
relative to the linker and the opposite side 
of the nucleosome. We observed a de- 
crease in overall read counts in the pre- 
sumed linker region, indicating reduced 
DNase I digestion between nucleosomes. 
A similar plot of DNase-seq read counts 
relative to in vitro dyads does not display 
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Table 1. Overlap of HMM nucleosome calls 



Setl 
Overlap 
Set 2 








In vitro 
Nucleosomes 


DUKE DNase 


Permuted 
DNase 


FAIRE 


UC DNase 






mmmmmi 




12,069,380 1 
518,081,8181 
17.88% I 


11,773,074 1 
537,821, 557| 
18.56% I 


12,882,169 1 
693,421, 17l| 
23.93% 1 


III VI 11 u 


Overlap of Row(%) 
[P-Value] 




00,U 1 \J, 1 \JO 

39.19% 
[0.025] 


19,118,288 

19.37% 

[0.56] 


20,097,313 

20.36% 

[0.54] 


31,586,925 

32.01% 

[0.010] 


DUKE 
DNase 


ouvci ciyc yuyj J 

Overlap of Row(%) 
[P-Value] 


\J\J,\JI yJ,l \J\J 

4.34% 
[0.025] 




174,951,169 

19.62% 

[0.65] 


182,401,882 

20.46% 

[0.59] 


311,897,617 

34.98% 

[<1e-10] 


Permuted 
DNase 


Coverage (bp) 
Overlap of Row(%) 
[P-Value] 


19,118,288 

3.69% 

[0.56] 


174,951,169 

33.77% 

[0.65] 




105,530,733 

20.37% 

[0.61] 


140,257,510 

27.07% 

[0.29] 


FAIRE 


Coverage (bp) 
Overlap of Row(%) 
[P-Value] 


20,097,313 

3.74% 

[0.54] 


182,401,882 

33.91% 

[0.59] 


105,530,733 

19.62% 

[0.61] 




135,696,098 

25.23% 

[0.93] 


UC 

DNase 


Coverage (bp) 
Overlap of Row(%) 
[P-Value] 


31,586,925 

4.56% 

[0.010] 


311,897,617 

44.98% 

[<1e-10] 


140,257,510 

20.23% 

[0.29] 


135,696,098 

19.57% 

[0.93] 





Data sets that show significant overlap (P < 0.05) are shaded light gray. 



periodicity, further supporting that the midpoints of DARNs are 
not dyads (Supplemental Fig. S5A). 

To test whether this pattern was due to a bias in DNase I di- 
gestion at the nucleotide level, we analyzed these regions in DNase- 
seq data from naked DNA extracted from the K562 cell line. We did 
detect a periodicity in digestion at these same sites that matched 
what we observed in fully constituted nucleosomal DNA, but at 
a greatly reduced amplitude (Supplemental Fig. S6) suggesting that 
this was not the predominant cause of this digestion pattern. 

We also investigated dinucleotide base frequencies in these 
subsets of DARNS. Periodic weak (W = A/T) dinucleotides have 
been associated with well-positioned nucleosomes and tend to be 
out of phase with strong (S = G/C) dinucleotides (Satchwell et al. 
1986; Segal et al. 2006). We noted that in both 5' end DARNS 
(Fig. 5C) and 3' end DARNS (Fig. 5D), SS dinucleotide levels were 
highest in DNA incorporated into the nucleosome and in phase 
with DARNS peaks, while WW dinucleotide levels were highest 
in the linker and out of phase with DARNS peaks. This is consis- 
tent with previous reports of nucleosomes with GC-rich cores and 
AT-rich flanks (Valouev et al. 2011) as well as SS dinucleotides 
aligning to exposed positions where the minor groove faces out- 
ward from the histone. A similar trend was evident in an equiva- 
lent plot relative to the in vitro dyads (Supplemental Fig. S5B). Not 
surprisingly, the periodicity in each plot was again strongest over 
the region of the nucleosome covered by the DARNS, supporting 
the relationship between rotational stability and dinucleotide pe- 



riodicity. To determine whether the periodicity could be recovered 
at the in vitro dyads, we aligned them by the nearest negative- 
strand correlation peak. The periodicity in dinucleotide frequency 
was indeed visible and fairly symmetrical across the surrounding 
region (Supplemental Fig. S5C). Taken together, these results sup- 
port the periodic features of DNA as it is incorporated into the 
nucleosome. We also note that as expected, all of these results are 
symmetric for the 5' and 3' end DARNS when adjusting for ori- 
entation to the dyad. 

Comparison to DARNS in a single ceil type and their 
relationship to genomic features 

DNase-seq data (—1.26 billion reads) from 70 HapMap Yoruba 
LCLs was recently generated by the Pritchard and Gilad labora- 
tories at the University of Chicago (UC) using the same DNase-seq 
protocol (Degner et al. 2012). To demonstrate the reproducibility 
and enable analysis in a single cell type, we annotated DARNS using 
these data. The UC data produced —13 million DARNS covering 
-690 million (23.9%) bases of the genome (Table 1). The UC data 
showed the same patterns as the Duke DNase-seq data with re- 
spect to aligned read spacing (Supplemental Fig. S7A), Fourier 
analysis (Supplemental Fig. S7B), and distribution of correlation 
scores compared with permuted DNase-seq data (P < 10~^°) 
(Supplemental Fig S7C). Likewise, the UC DARNS significantly 
overlapped in vitro dyads (31.6 million bases, or 32% of the in vitro 
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Figure 5. Properties of DARNS that map to the 5' end or 3' end of nucleosomes. In vitro dyads (Valouev et al. 201 1 ) were used to distinguish DARNS 
that mapped to either the 5' end (A,C) or the 3' end (B,D) of the nucleosome (for details, see Methods). (A) DNase reads aligned by centermost negative- 
strand correlation peak (midpeak) of DARNS that map to the 5' end of the nucleosome exhibit a greater oscillation pattern and DNase signal on the 5' end 
of the nucleosome compared to either the linker or the 3' end of the nucleosome. In vivo LCL MNase-seq signal is transposed on top (black line) to help 
designate the locations of the nucleosome and the linker (indicated at bottom). (B) Same as A, but for DARNs that map to the 3' end of the nucleosome. 
(C,D) Same plot as A and B, but showing dinucleotide (W = A/J, S = C/G) frequency. Note that the central peak in the SS dinucleotide signal aligns to the 
midPeak of the DARNS, suggesting that they occur when the minor groove is exposed. 



nucleosomes; P < 0.010), as well as with the positions of Duke 
DARNS called from the 49 samples (312 million bases, or 35% of 
Duke DARNS; P < 10"^°) (Table 1, bottom row). Therefore, the 
DNase I digestion pattern is a reproducible feature of this DNase- 
seq protocol and leads to consistent DARNS annotations from 
independent DNase-seq experiments. 

Promoters often contain a nucleosome-free region (NFR) at the 
transcription start site (TSS) with a -1 nucleosome and a prominent 
+1 nucleosome followed by decreasingly well-positioned nucleo- 
somes over the gene body (Jiang and Pugh 2009). We found a sig- 
nificant depletion in Duke (Fig. 6A) and UC (Fig. 6B) DARNS at TSSs 
compared with intergenic regions (P < 10~^^), but a 20% enrich- 
ment around the promoter area compared with background levels 
(P< 10~^°), particularly downstream. Similarly we found reduced 
DARNS density immediately downstream from the transcription 
termination site (TTS) (Supplemental Fig. S8A). Gene deserts had 
reduced levels of DARNS despite the fact that in aggregate the 
10.4-bp period was still present (see Methods) (Supplemental Fig. 
S8A,C). Thus, our data were in agreement with the established 
profile of nucleosomes around genes. 

CpG islands, often found at gene promoters, occur in various 
sizes and have been shown to occlude nucleosome binding (Fenouil 
et al. 2012). When we plot the distribution of DARNS around CpG 
islands categorized by varying length thresholds, we find that the 
area depleted of DARNS increases with the size of the CpG islands 



(Supplemental Fig. S8B), supporting the interference of nucleosome 
formation by these regions. Moreover, we find that DNase-seq reads 
mapping to CpG islands do not exhibit the periodicity associated 
with nucleosomes (Supplemental Fig. SBC). 

Active DHS sites are nucleosome depleted by definition, and 
DHS has been shown to correlate with the strength of nearby 
nucleosome positioning (Gaffney et al. 2012). Both Duke (Fig. 6C) 
and UC (Fig. 6D) DARNS demonstrated a strong depletion in 
ubiquitous DHS sites detected in all cell types included in this 
study compared with random intergenic sites (P < 10"^^). Ubiq- 
uitous LCL DHS sites, found in all seven LCL samples from the 
Duke data, also demonstrated DARNS depletion for both sets (P < 
10~^°). These features indicate that DARNS reproduce the de- 
pletion of nucleosomes at DHS sites that are active in all relevant 
cell types. 

We used the UC DARNS from the LCLs to explore their profile 
at variably open DHS sites present in only LCLs compared with the 
non-LCL cell types in the Duke data. To do this, we divided the 
cumulative set of DARNS into three subsets: (1) 6.7 million Duke- 
specific DARNS covering 315 million bases (10.9% of the genome); 
(2) 5 million UC-specific DARNS covering 207 million bases (7.2%); 
and (3) 9 million DARNS annotated by both covering 312 million 
bases (10.8%). We considered the Duke-specific DARNS to be 
largely ubiquitous because they were only found by the diverse 
cell type data and considered the UC-specific DARNS to be cell-type- 
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Figure 6. Properties of DARNS around transcription start sites (TSSs) and DHS sites, (/-axis) The normalized density of DARNS. DARNS identified from 
Duke (left) and the University of Chicago (UC; right) (Degner et aL 201 2). DNase-seq data are depleted around TSS (A,B) and enriched in surrounding areas 
(particularly downstream toward the gene body) relative to random intergenic coordinates. (C,D) DARNS (left, Duke; right, UC) are most depleted around 
ubiquitous DHS sites present in all cell types (ubiquitous DHS), as well as ubiquitous DHS sites identified in all LCL cell lines (ubiquitous LCL). DARNS were 
identified as Duke-specific (E), UC-specific (F), and shared by both data sets (G). (f ) Duke-specific DARNS, but depleted around non-LCL DHS sites (yellow) 
and enriched around LCL-specific DHS (green), (f) In contrast, UC-specific DARNS are enriched for non-LCL DHS sites and depleted around LCL-specific 
DHS sites. (C) DARNS shared by both data sets show similar patterns around these DHS sites. 
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specific because they were only found by the LCL data. We found 
a Duke-specific enrichment (P < 10~^^) (Fig. 6E) and UC-specific 
depletion (P < 10"^^) (Fig. 6F) of DARNS over the middle of DHS 
sites present in LCLs but not in other cell types (LCL-specific 
DHS). In contrast, DHS sites present only in non-LCL cell types 
(non-LCL DHS) showed the reverse trend with a Duke-specific de- 
pletion (P < 10"^^) and UC-specific enrichment (P < 10"^^) of 
DARNS. The shared DARNS showed similar profiles for both sets of 
DHS sites (Fig. 6G). Likewise, when we analyzed DHS sites detected 
in a single non-LCL cell type (HUVEC), Duke-specific DARNS and 
shared DARNS were depleted (P < 10"^°) (Supplemental Fig. S9). 
This indicates that DARNS common to Duke and UC data were 
among the strongest and most-conserved across all cell types, while 
cell-type-specific DARNS were depleted from DHS sites present 
within that cell type and enriched in cell types where the DHS site 
was absent. Therefore, this suggests that many inactive regulatory 
regions are occupied by rotationally stable nucleosomes. 

Discussion 

In this study, we provide evidence that the rotational positioning 
of nucleosomes at individual loci can be annotated from genome- 
wide DNase I digestion patterns derived from DNase-seq data and 
that many regions of the genome maintain nucleosome rotational 
stability across diverse cell types. In addition to enabling us to ex- 
plore ubiquitously stable nucleosomes, we found evidence of cell- 
type-specific nucleosome positioning by comparing to a second, 
independent DNase-seq data set from a single cell type. With future 
higher throughput sequencing, it will be possible to use our model 
on single-cell-type DNase-seq experiments to further explore cell 
type specificity. We note that using DNase-seq in this manner sig- 
nifies a distinct approach from MNase-seq in uncovering properties 
of and annotating nucleosome positions. Although DNase-seq will 
likely not supplant MNase-seq for identifying translational nucle- 
osome positions, it can provide a complementary view of stable 
nucleosomes and provide unique high-resolution information on 
rotational positioning. 

DARNS provide spatial information regarding the orientation 
of DNA in a nucleosome. Therefore, they can be used to give context 
to or align other features of DNA such as nucleotide frequencies and 
regulatory motifs. This was illustrated by aligning dinucleotide 
periodicities around in vitro dyads (Supplemental Fig. 5C). Since 
DNase I is known to digest in the minor groove of DNA, positions 
corresponding to correlation peaks in DARNS will indicate where 
the minor grooves are likely facing away from the histone surface 
(Noll 1974; Cousins et al. 2004). A comparable connection was 
shown between the rotational positioning of nucleosomes and DNA 
methylation in the minor groove (Chodavarapu et al. 2010). 

Many DARNS appear to cover only one-half of annotated 
nucleosomes. We hypothesize that the periodic pattern that forms 
the basis of our model is weaker at the dyad, preventing DARNS 
from extending across to the opposite half. Since a nucleosome is 
composed of DNA wrapped around two histone tetramers, we 
suggest the periodic constraint imposed on DNase digestion is re- 
laxed as it transitions from one histone tetramer to the next. This 
may be related to the loss in conservation of the dinucleotide pe- 
riodicity at the dyad (Albert et al. 2007; loshikhes et al. 2011). 
Additionally, it has been proposed that pairs of nucleosomes in- 
corporated into higher-order structures, like 30-nm fibers, are 
asymmetrically protected from DNase digestion (Staynov 2000), 
which may contribute to DARNS mapping to only one side of the 
nucleosome but not extending across the dyad. 



The positioning of ubiquitous and cell-type-specific DARNS 
may help our understanding of cell-specific gene regulation. Our 
results suggested that although variably open DHS sites were 
strongly depleted of DARNS when active, DARNS reappear in cell 
types where the DHS sites are closed. For regions that contain 
a well-positioned nucleosome, our ability to determine the ori- 
entation of the major and minor grooves relative to the histone 
surface may allow us to better understand how TFs initially access 
these nucleosome bound ds-regulatory elements in response to 
changing cell conditions. For example, a binding site that maps to 
DNA in a nucleosome can be exposed on cue by chromatin 
remodelers that evict the nucleosome or shift the rotational set- 
tings (Jiang and Pugh 2009). Several models have been proposed 
for estimating TF binding while taking into account predicted nu- 
cleosome occupancy (Narlikar et al. 2007; Raveh-Sadka et al. 2009); 
these may benefit from incorporating relevant information from in 
vivo DARNS. 

Finally, DARNS may be used as a starting point for investi- 
gating higher-order nucleosome structures in vivo. There are two 
proposed structures for how linear arrays of nucleosomes form 
30-nm fibers (Tremethick 2007). These hierarchical organizations 
of nucleosomes are likely to form inaccessible regions that may 
result in a recognizable pattern of DNase I digestion (Staynov 
2000): This may be reflected as higher-order patterns in our data. 
Nucleosomes are also further compacted into chromosomal struc- 
tures like centromeres, telomeres, and heterochromatin (van Holde 
and Zlatanova 1995). DNase-seq data may be able to contribute to 
our understanding of how nucleosomes are incorporated into these 
chromosome architectures, which will help to elucidate how the 
genome is spatially organized. 



Methods 

DNase-seq and FAlRE-seq data 

The Duke DNase-seq data is a composite of the results from the 
following 49 samples (seven LCLs and 42 unique cell types) with 
replicates: GM12891, GM12892, GM12878, GM19238, GM19239, 
GM19240, GM18507, A549, Chorion, CLL, D721, E_myoblast, 
FB0167P, FB8470, Fibroblasts_park, FSHD_myoblast, H1_ES, H54, 
H9_ES, HelaS3, HelaS3_IFNA, Hepatocytes, HepG2, HMEC, HPDE6, 
Huh7_5, Huh7, HUVEC, iPS, K562, LHSR, LHSRJnduced, LnCAP, 
LnCAPJnduced, MCF7, Melanocyte, Myoblast, Myometrial, Myo- 
tube, NHEK, Osteoblast, Pancreatic_islets_dedif, Pancreaticjslets, 
PATu8988T, SM_SFM, Stellate, T47, TE, Trophoblast. The UNC 
Chapel Hill FAIRE-seq data is a composite of the results from 19 
samples (a subset of the 49) with replicates from the Lieb laboratory 
(Giresi et al. 2007): GM12891, GM12892, GM12878, GM19238, 
GM18507, A549, D721, H1_ES, H54, HelaS3, HelaS3_IFNA, HepG2, 
HUVEC, K562, LHSR, LHSRJnduced, NHEK, Pancreaticjslets, 
Trophoblast. More information on these cell types can be found 
at genome.ucsc.edu/ENCODE/cellTypes.html. The UC DNase-seq 
data originated from LCLs from 70 individuals with replicates 
(Degner et al. 2012). 

The Naked DNA data set was generated by purifying total 
genomic DNA from unfixed K562 cells using phenol extraction 
and ethanol precipitation. This naked DNA was treated with DNase 
I in the same manner as the above cell types and used to generate 
a DNase-seq library. 

All Duke DNase-seq and UNC FAIRE-seq data were aligned to 
the hgl9 Human Genome Assembly. The raw UC DNase-seq data 
were initially aligned to hgl8, but DARNS are given in hgl9 (see 
Nucleosome Annotations and Comparisons). DNase-seq and FAIRE- 
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seq were generated as part of the ENCODE Project and are available 
at the UCSC Genome Browser (http://genome.ucsc.edu/ENCODE) 
or at the NCBI Gene Expression Omnibus (GEO) (http://www.ncbi. 
nlm.nih.gov/geo) under accession numbers GSE32970 (DNase-seq) 
and GSE35239 (FAIRE-seq). 

Combining multiple cell types 

Sequence files for individual cell types consisted of reads combined 
across all replicates. In our merged data, we counted, for each base, 
the number of cell types with an aligned read starting at that base, 
considering each strand separately. The value at each base on each 
strand was used as the representative aligned sequence read count 
for that base. This was done for each of the Duke DNase-seq, UC 
DNase-seq, and FAIRE-seq data sets. 

Fourier transform analysis 

The fft (fast Fourier transform) function in MATLAB was used 
to calculate the Fourier transform for 1000-bp sliding windows 
(100-bp overlap). The dominant frequency in each window was set 
to the maximum of all frequencies greater than 0.001 . These values 
were inverted to calculate the period and used to create the his- 
togram of the dominant periods across all windows. 

Pattern of DNase 1 digestion around tlie nucleosome 
and correlation peaks 

The frequencies of distances between DNase I digestion sites 
(y-axis) were extracted for the distances 31-76 bases (x-axis) from 
the plot shown in Figure IB. This subset was selected to avoid the 
artifact at —20 bp resulting from sequence length. Moreover, this 
pattern is large enough to capture the DNase digestion period but 
small enough to match multiple times within typical nucleosome 
regions. We "flattened" these values by setting minima points to 
zero and rescaling the remaining values accordingly and then 
mirroring this pattern to the left to create a symmetric pattern of 
DNase I digestion around the nucleosome (Supplemental Fig. S3 A). 
For each strand, we calculated the correlation between the nucle- 
osome pattern and the read counts in 91 -bp sliding windows across 
the genome and assigned the score to the center base. We esti- 
mated the significance of the difference in variance between the 
distributions of DNase and permuted correlations using an F-test. 
Correlation peaks were defined as local maximums that were 
bounded by negative correlation values. 

HMM for determining DARNS 

Genomic positions were labeled as corresponding to a positive 
correlation peak (1), a negative correlation peak (-1), or neither/ 
both (0), with each strand being considered separately. The resulting 
strings of labels were input into an HMM that consisted of 14 states 
with transitions as depicted in Supplemental Figure S3B. In the 
single background state, the emissions probabilities for the base la- 
bels (—1, 0, 1) were empirically derived from random genomic re- 
gions. The remaining ''nucleosome" states represented the expected 
cycle between positive and negative correlation peaks within the 
nucleosome; the path through the states as well as the associated 
emission and transition probabilities were derived from observed 
frequencies and distances between peaks. A region with the desired 
pattern, which we refer to as a DARNS, starts and ends in the nu- 
cleosome state representing either the negative (-1) or positive (1) 
peak. Then, the number of zero or "no peak" states between cor- 
relation peak states reflects the observed probable spacing. The 
hmmviterbi function in MATLAB was used to determine the op- 



timal path through the HMM for each chromosome. The Mid- 
Peak was set to the middle most negative peak within each 
DARNS. We trained a regression classifier to estimate the FDR 
as described in the Supplemental Material. Coordinates, confi- 
dence scores, and correlation peaks for DARNS are available at 
http://fureylab.web.unc.edu/datasets/darns/ and in Supplemental 
Material. 



Nucleosome annotations and comparisons 

The program featureBits, part of the UCSC toolbox (Kent et al. 
2002), was used to calculate base overlaps between genomic regions 
annotated as DARNS, control regions, and extended in vitro dyads. 
P-values for the percentage of bases in the overlaps were calculated 
using hypergeometric tests. The locations of in vitro dyads inferred 
from well-positioned nucleosomes were based on MNase-seq ex- 
periments performed in the Sidow laboratory on naked DNA com- 
bined with histones (Valouev et al. 2011). To determine overlap, we 
generated in vitro nucleosome boundaries by extending 80bp in 
both directions from the dyad. The program liftOver, also part of the 
UCSC toolbox, was used to translate the positions of both the in 
vitro nucleosomes and the UC DARNS from the hgl8 to hgl9 as- 
sembly (Kent et al. 2002). We removed sites that corresponded to 
"blacklisted" regions as designated by the ENCODE project and 
available within the UCSC Genome Browser, Mappability annota- 
tion track (http://genome.ucsc.edu/; A. Kundaje and E. Birney, 
unpubl.). The regions of significant enrichment in the in vitro 
dyad profile (Fig. 4C) scored P < 0.05 after Bonferroni correction 
of the Poisson distribution derived from the permuted data. The 
MNase-seq data sets for the GM12878 LCL and the K562 leukemia 
cell line were produced by the Snyder laboratory and are also 
available within the UCSC Genome Browser, Nucleosome Position 
by MNase-seq from ENCODE/Stanford/BYU (http://genome.ucsc. 
edu/; M. Snyder, D. Raha, S.Johnson, E. Winters, A. Sidow, Z. Weng, 
C. Smith, P. Lacroute, P. Cayting, A. Kundaje, unpubl.). These data 
were downloaded as a processed file with normalized nucleosome 
occupancy scores for each base. The regions of enrichment in the 
GM12878 MNase-seq occupancy signal profile (Fig. 4D) were sig- 
nificant at P < 0.05 after Bonferroni correction based on the Gaussian 
distribution derived from the permuted data. 



Distribution of DARNS around genomic features 

For comparison, we chose —20,000 random intergenic sites from 
large alignable regions (>2500-bp excluding repetitive regions and 
alignment gaps) across the genome. We determined the locations 
of TSS, TTS, gene deserts (>100 kb without gene), and CpG islands 
using annotations from the UCSC Genome Browser downloaded 
for hgl9 (http://genome.ucsc.edu/). DHS sites are annotated ac- 
cording to the method described previously (Song et al. 201 1), and 
the sets are defined as in Supplemental Table SI. To investigate the 
distribution of DARNS around genomic features, we summed the 
number of instances in which DARNS overlapped each base around 
the reference point (TSS, TTS, or midpoint) of the feature (DHS site, 
CpG Island, gene desert, or random intergenic) and binned into 
25-bp windows. Then, we normalized for the number of sites in 
the set of genomic features (Fig. 6; Supplemental Fig. S8A,B) or the 
number of bases covered by each type of DARNS (Supplemental 
Fig. S9). For the UC DARNS, we used the translated hgl9 assembly 
positions. P-values were calculated using the statistic by com- 
paring the number of DARNS covering the window of the TSS or 
DHS midpoint with the number covering random intergenic sites. 
Fold enrichment of promoter regions was calculated similarly by 
comparing to the average random level. 
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